本文对人机对象切换的文献进行了调查。切换是一种协作的关节动作,其中代理人,给予者,给予对象给另一代理,接收器。当接收器首先与给予者持有的对象并结束时,当给予者完全将物体释放到接收器时,物理交换开始。然而,重要的认知和物理过程在物理交换之前开始,包括在交换的位置和时间内启动隐含协议。从这个角度来看,我们将审核构成了上述事件界定的两个主要阶段:1)预切换阶段和2)物理交流。我们专注于两位演员(Giver和Receiver)的分析,并报告机器人推动者(机器人到人类切换)和机器人接收器(人到机器人切换)的状态。我们举报了常用于评估互动的全面的定性和定量度量列表。虽然将我们的认知水平(例如,预测,感知,运动规划,学习)和物理水平(例如,运动,抓握,抓取释放)的审查重点,但我们简要讨论了安全的概念,社会背景,和人体工程学。我们将在人对人物助手中显示的行为与机器人助手的最新进行比较,并确定机器人助剂的主要改善领域,以达到与人类相互作用相当的性能。最后,我们提出了一种应使用的最小度量标准,以便在方法之间进行公平比较。
translated by 谷歌翻译
Random graph models with community structure have been studied extensively in the literature. For both the problems of detecting and recovering community structure, an interesting landscape of statistical and computational phase transitions has emerged. A natural unanswered question is: might it be possible to infer properties of the community structure (for instance, the number and sizes of communities) even in situations where actually finding those communities is believed to be computationally hard? We show the answer is no. In particular, we consider certain hypothesis testing problems between models with different community structures, and we show (in the low-degree polynomial framework) that testing between two options is as hard as finding the communities. In addition, our methods give the first computational lower bounds for testing between two different `planted' distributions, whereas previous results have considered testing between a planted distribution and an i.i.d. `null' distribution.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Human operators in human-robot teams are commonly perceived to be critical for mission success. To explore the direct and perceived impact of operator input on task success and team performance, 16 real-world missions (10 hrs) were conducted based on the DARPA Subterranean Challenge. These missions were to deploy a heterogeneous team of robots for a search task to locate and identify artifacts such as climbing rope, drills and mannequins representing human survivors. Two conditions were evaluated: human operators that could control the robot team with state-of-the-art autonomy (Human-Robot Team) compared to autonomous missions without human operator input (Robot-Autonomy). Human-Robot Teams were often in directed autonomy mode (70% of mission time), found more items, traversed more distance, covered more unique ground, and had a higher time between safety-related events. Human-Robot Teams were faster at finding the first artifact, but slower to respond to information from the robot team. In routine conditions, scores were comparable for artifacts, distance, and coverage. Reasons for intervention included creating waypoints to prioritise high-yield areas, and to navigate through error-prone spaces. After observing robot autonomy, operators reported increases in robot competency and trust, but that robot behaviour was not always transparent and understandable, even after high mission performance.
translated by 谷歌翻译
Monocular Depth Estimation (MDE) is a fundamental problem in computer vision with numerous applications. Recently, LIDAR-supervised methods have achieved remarkable per-pixel depth accuracy in outdoor scenes. However, significant errors are typically found in the proximity of depth discontinuities, i.e., depth edges, which often hinder the performance of depth-dependent applications that are sensitive to such inaccuracies, e.g., novel view synthesis and augmented reality. Since direct supervision for the location of depth edges is typically unavailable in sparse LIDAR-based scenes, encouraging the MDE model to produce correct depth edges is not straightforward. In this work we propose to learn to detect the location of depth edges from densely-supervised synthetic data, and use it to generate supervision for the depth edges in the MDE training. %Despite the 'domain gap' between synthetic and real data, we show that depth edges that are estimated directly are significantly more accurate than the ones that emerge indirectly from the MDE training. To quantitatively evaluate our approach, and due to the lack of depth edges ground truth in LIDAR-based scenes, we manually annotated subsets of the KITTI and the DDAD datasets with depth edges ground truth. We demonstrate significant gains in the accuracy of the depth edges with comparable per-pixel depth accuracy on several challenging datasets.
translated by 谷歌翻译
Understanding pedestrian behavior patterns is a key component to building autonomous agents that can navigate among humans. We seek a learned dictionary of pedestrian behavior to obtain a semantic description of pedestrian trajectories. Supervised methods for dictionary learning are impractical since pedestrian behaviors may be unknown a priori and the process of manually generating behavior labels is prohibitively time consuming. We instead utilize a novel, unsupervised framework to create a taxonomy of pedestrian behavior observed in a specific space. First, we learn a trajectory latent space that enables unsupervised clustering to create an interpretable pedestrian behavior dictionary. We show the utility of this dictionary for building pedestrian behavior maps to visualize space usage patterns and for computing the distributions of behaviors. We demonstrate a simple but effective trajectory prediction by conditioning on these behavior labels. While many trajectory analysis methods rely on RNNs or transformers, we develop a lightweight, low-parameter approach and show results comparable to SOTA on the ETH and UCY datasets.
translated by 谷歌翻译
We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices.
translated by 谷歌翻译
深度神经网络已被证明容易受到基于语义特征扰动输入的对抗性攻击。现有的鲁棒性分析仪可以建议语义特征社区提高网络的可靠性。但是,尽管这些技术取得了重大进展,但他们仍然很难扩展到深层网络和大型社区。在这项工作中,我们介绍了VEEP,这是一种主动学习方法,将验证过程分为一系列较小的验证步骤,每个验证步骤都会提交给现有的鲁棒性分析仪。关键想法是基于先前的步骤来预测下一个最佳步骤。通过参数回归估算认证速度和灵敏度来预测最佳步骤。我们评估了MNIST,时尚摄影师,CIFAR-10和Imagenet的VEEP,并表明它可以分析各种特征的邻域:亮度,对比度,色相,饱和度和轻度。我们表明,平均而言,鉴于90分钟的超时,VEEP在29分钟内验证了96%的最大认证社区,而现有的拆分接近近距离验证,平均在58分钟内验证了73%的最大认证社区的73%。
translated by 谷歌翻译
近年来,深入的强化学习(DRL)在模拟机器人控制任务中都取得了巨大进步。然而,将DRL应用于新型机器人控制任务仍然具有挑战性,尤其是当研究人员必须设计动作和观察空间以及奖励功能时。在本文中,我们研究了部分可观察性,作为将DRL应用于机器人控制任务的潜在失败来源,当研究人员不相信观察空间是否完全代表基本状态时,可能会发生这种情况。我们比较了各种部分可观察性条件下的三种常见DRL算法TD3,SAC和PPO的性能。我们发现TD3和SAC很容易被卡在本地Optima和表现不佳的PPO中。我们提出了香草TD3和SAC的多步版本,以改善基于一步引导的部分可观察性的鲁棒性。
translated by 谷歌翻译
在诸如增强学习和变分自动编码器(VAE)培训等上下文中,梯度估计通常是将生成模型与离散潜在变量拟合的必要条件。撤销估计器(Yin等,2020; Dong,Mnih和Tucker 2020)在许多情况下实现了Bernoulli潜在变量模型的最新梯度差异。然而,撤消和其他估计器在参数空间的边界附近可能会爆炸方差,而解决方案倾向于存在。为了改善此问题,我们提出了一个新的梯度估计器\ textIt {BitFlip} -1,该{Bitflip} -1在参数空间边界的方差较低。由于BITFLIP-1具有与现有估计器的互补属性,因此我们引入了一个汇总的估计器,\ textIt {无偏梯度方差剪辑}(UGC),该估计量使用BITFLIP-1或每个坐标的摘要梯度更新。从理论上讲,我们证明UGC的差异均高于解除武装。从经验上讲,我们观察到UGC在玩具实验,离散的VAE训练以及最佳子集选择问题中实现了优化目标的最佳价值。
translated by 谷歌翻译